CPU

Constructing a CPU

Constructing a CPU:

  1. Add Register Bank/File
  2. Add ALU (Arithmetic Logic Unit)
  3. Add Cache

On I/O: There are only two ways for external devices to communitate with the CPU:

  1. Interrupt: Stop what you’re doing and respond.
  2. Reject means do nothing.
  3. Accept means respond.
  4. Delay means delay.
  5. Polling: Periodically ask each device if they have input.

Interrupt v. Polling: If a device fails, interrupt won’t catch it.

Computer Architecture

Different Instruction Formats Require Different CPU Architectures

Suppose this operand format:

AC <- AC + 100

The realization of this may result in a huge delay if the same memory cell is required for the next instruction.

Thus, it’s better to use an intermediate register to save the possible required data.

Memory-Memory Architectures (Harvard)

Operation Operand1, Operand2, Operand3, Next Instruction
Operation Operand1, Operand2, Operand3
Operation Operand1, Operand2

Register-Memory Architectures (Harvard)

Operation Operand

Register-Memory Architectures (Von Neumann)

Operation Operand

Simple Fetch Architecture

Fetch: During the fetch cycle, the CPU retrieves the instruction from memory.

Instruction Fetch:

Instruction Decode

  1. Find out what is the instruction
  2. Access register to read parameters.

How to Speed Up Computers

Superscaling

Until now, we’ve been using specialized modules for fetching, decoding, executing, and writeback.

Now, suppose we instead get 4 modules who can do all operations.

F1D1E1WB1
F2D2E2WB2
F3D3E3WB3
F4D4E4WB4
F5D5E5WB5
F6D6E6WB6
F7D7E7WB7
F8D8E8WB8

Limitation: Instruction dependency. You can’t write a program where all the instructions aren’t related.

External Cache

The main memory is very slow, one solution is to add an external cache.

Suppose it takes:

The average access time (AAT) would be

h\tau_1 + (1-h)t_2

For example, suppose h = 0.9, t_1 = 1, t_2 = 100

\text{AAT} = 0.9 \times 1 + 0.1 \times 100 = 10.9

Professor’s Personal Experience: In the real world, the hit ratio lies between 0.7 and 1.0

The lie: If we have to go to main memory, we’ll actually need to go in and pull the data out, which means we’ll be traveling across t_1 and t_2

\boxed{ \text{AAT: } h\tau_1 + (1-h)(t_1 + t_2) }

If we plug in h = 0.9, t_1 = 1, t_2 = 100, we’d get:

0.9 * 1 + (0.1)(101) = 11

Q: Why can’t we put the cache in the CPU?

A: Space.

CISC v. RISC:

ADD AX,1 # ADD is a general instruction
INC AX   # INCREMENT is not a general instruction

RISC can increase space in the CPU.

By switching to CISC, we can put cache in the CPU.

Not only that, but we can split up the cache into a smaller and bigger cache:

Accessing cache is random.

On CAM

The technology of the cache and the CPU must be the same. * e.g., if the CPU is 1nm, the cache must also be 1nm. * Technically, the CPU can be larger than the cache, but it’ll be a waste of money. + e.g., 1nm CPU and 0.5 nm cache is valid, but wasteful.